-
Notifications
You must be signed in to change notification settings - Fork 10k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix copying on supplementary plane characters #10530
Conversation
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/868b3ff72df751a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/fdf1a1b173182d7/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/868b3ff72df751a/output.txt Total script time: 17.81 mins
|
From: Bot.io (Windows)FailedFull output at http://54.215.176.217:8877/fdf1a1b173182d7/output.txt Total script time: 25.50 mins
Image differences available at: http://54.215.176.217:8877/fdf1a1b173182d7/reftest-analyzer.html#web=eq.log |
The Windows failures do not look related to this patch. @Snuffleupagus Could you review this? Also, would a "text" reference test catch this? If so, it should be added. |
/botio-linux preview |
From: Bot.io (Linux m4)ReceivedCommand cmd_preview from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/e6e5c9fe25609c8/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/e6e5c9fe25609c8/output.txt Total script time: 1.70 mins Published |
Firefox was updated on the bot, this is the "usual" sort of fallout from that.
Yes it will[1], and a test-case is a requirement for accepting the patch. [1] Try adding |
OK, I'm working on the test. |
Oh, making the test was harder than I expected (I could not figure out what "text" tests do). I just gave up. Still, I can ... at least regenerate test PDF file only with free/open source fonts (Source Sans Pro and Hanazono Mincho): issue10529.pdf |
I read the "Contributing" page again and added a testcase. I'm not sure this is okay though. |
I ran Here is the reference image before/after the fix on Ubuntu 18.10 + additional fonts (as described in "Bots") + Firefox 65.0. I haven't tested it on Windows but I think it will not show missing glyphs because of preinstalled PMingLiU-ExtB font. |
pdf.js had a problem when copying characters on supplementary planes (0xPPXXXX where PP is nonzero). This is because certain methods of PartialEvaluator use classic String.fromCharCode instead of ES6's String.fromCodePoint. Despite the fact that readToUnicode method *tried* to parse out-of-UCS2 code points by parsing UTF-16BE, it was inadequate because String.fromCharCode only supports UCS-2 range of Unicode.
3dc8d97
to
96ba6af
Compare
Fixed code style and squashed. |
/botio makeref |
From: Bot.io (Linux m4)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.67.70.0:8877/17a6f063e58269a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_makeref from @timvandermeij received. Current queue size: 0 Live output at: http://54.215.176.217:8877/5155dc9f2b52363/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/17a6f063e58269a/output.txt Total script time: 16.14 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/5155dc9f2b52363/output.txt Total script time: 22.69 mins
|
Thank you for fixing this! |
Hi! I have a little question: Does this execute when copying text in the viewer or when building the text layer? |
This runs when the text layer is built and therefore it impacts the way the text can be copied. |
Awesome, thanks for clarifying! |
When I copy text from PDF displayed on pdf.js, only lower 16-bit of code points are copied (upper bits [that require surrogate pair] are ignored). See an example on issue #10529.
This is because PartialEvaluator.readToUnicode method used classic String.fromCharCode (which does not support out-of-UCS2 code points) instead of ES6's String.fromCodePoint (which does).
Merging this pull request replaces
String.fromCharCode
toString.fromCodePoint
where necessary (two of four occurrences inevaluator.js
).